Reuse rollout token counts across limit checks by xeophon · Pull Request #1799 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-21T08:59:44Z

Overview

Reduce synchronous token-limit overhead by avoiding repeated reconstruction of the same derived branch paths. The trace node graph remains the source of truth, limit precedence and soft-cap behavior are unchanged, and the commit only changes the production interception server.

What changed

Return immediately when no token cap is configured.
Count directly from Trace.nodes when canonical append order proves the graph is a single root-to-leaf path.
Materialize Trace.branches once for compacted, subagent, or otherwise non-linear graphs, then reuse that view across enabled input, output, and total token checks.
Preserve the existing max_turns → input → output → total precedence and >= boundaries.

Why

Trace.nodes stores each message once, while Trace.branches is an uncached derived view built by finding leaves and walking each parent chain. The previous input, output, and total properties each requested that view independently. When several token caps were enabled and still below their thresholds, the same graph paths were reconstructed up to three times.

The canonical linear case can safely use the existing node order without allocating a branch view. Other graph shapes continue through the established branch abstraction, but share one snapshot rather than rebuilding it for every count. This keeps arbitrary node ordering and branching semantics on the existing path.

Performance

Measurements use median time.perf_counter() wall time with GC before each repetition; peak Python allocation is measured separately with tracemalloc so tracing overhead does not affect timings.

Workload	Before	After	Time saved	Peak allocation
200k-node linear stress case, 1 token/node	88.784 ms	23.526 ms	65.258 ms (73.5%)	12.00 MiB → below 0.01 MiB display precision
2k nodes / 1k shared-trunk branches, 1 token/node	217.673 ms	125.654 ms	92.019 ms (42.3%)	8.23 MiB → 8.23 MiB
2k-node linear long-horizon case, 32 tokens/node	0.732 ms	0.315 ms	0.417 ms (57.0%)	0.16 MiB → below 0.01 MiB display precision
2k nodes / 10 branches, 32 tokens/node	5.341 ms	3.600 ms	1.741 ms (32.6%)	0.19 MiB → 0.19 MiB
10k nodes / 10 branches, 32 tokens/node	26.767 ms	18.172 ms	8.595 ms (32.1%)	0.93 MiB → 0.93 MiB

The branched peak remains unchanged because both paths hold at most one full branch snapshot at a time; the saving is reduced allocation churn and graph-walk CPU from eliminating additional snapshots. Since limit checks run synchronously from the interception session, the wall-time reduction also shortens the corresponding event-loop stall.

At higher token density, mask summation becomes a larger share of the work: the 2k-node / 10-branch case at 128 tokens per node measured 8.711 ms → 6.964 ms, saving 1.747 ms (20.1%). This is expected because the change targets graph reconstruction rather than token-mask arithmetic.

Scope

The commit contains only verifiers/v1/interception/server.py. Benchmark scripts, focused test scaffolding, project metadata, and lockfiles are intentionally excluded.

Note

Low Risk
Single-method performance refactor in limit checking; semantics are intended to match prior branch-based aggregation with lower allocation and graph-walk cost.

Overview
RolloutLimits.reached in the v1 interception server now evaluates token caps with less repeated work, without changing limit order (max_turns → input → output → total) or >= stop behavior.

When no token limits are set, it returns immediately after the turn check. For traces whose nodes form a single linear chain (each node’s parent is the previous index), it counts from trace.nodes directly instead of building trace.branches. For branched or non-canonical graphs, it materializes trace.branches once and sums prompt_len, completion_len, and total_tokens across branches for all enabled caps—replacing separate trace.prompt_len / completion_len / total_tokens reads that each reconstructed branch views independently.

^{Reviewed by Cursor Bugbot for commit 5f30b53. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Reuse rollout token counts across limit checks in `RolloutLimits.reached`

Adds an early return in RolloutLimits.reached when all token caps are None, avoiding unnecessary computation.
For traces forming a single linear chain, computes token counts directly from node-level data (node.token_ids lengths and masked token counts) rather than trace-level aggregates.
For non-linear (branched) traces, replaces single trace-level aggregates (trace.prompt_len, etc.) with sums across trace.branches.
Behavioral Change: token cap comparisons now use different aggregation paths depending on graph topology, which may produce different values than before for branched traces.

^{Macroscope summarized 5f30b53.}

macroscopeapp · 2026-06-21T09:07:28Z

Approvability

Verdict: Needs human review

Changes token limit enforcement logic from single-trace properties to summing across branches, altering when limits trigger. This gates whether rollouts continue and introduces new computation paths that warrant human review.

^{You can customize Macroscope's approvability policy. Learn more.}

mikasenghaas

this seems p minor?

xeophon changed the base branch from feat/nano-as-v1 to main June 23, 2026 04:10

Reuse rollout token counts across limit checks

5f30b53

xeophon force-pushed the codex/reuse-rollout-token-counts branch from af6db01 to 5f30b53 Compare June 23, 2026 04:17

xeophon requested a review from mikasenghaas June 23, 2026 04:25

mikasenghaas reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse rollout token counts across limit checks#1799

Reuse rollout token counts across limit checks#1799
xeophon wants to merge 1 commit into
mainfrom
codex/reuse-rollout-token-counts

xeophon commented Jun 21, 2026 •

edited by cursor Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

mikasenghaas left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xeophon commented Jun 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changed

Why

Performance

Scope

Reuse rollout token counts across limit checks in RolloutLimits.reached

Uh oh!

macroscopeapp Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xeophon commented Jun 21, 2026 •

edited by cursor Bot

Loading

Reuse rollout token counts across limit checks in `RolloutLimits.reached`

macroscopeapp Bot commented Jun 21, 2026 •

edited

Loading